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Spontaneous DNA damage may occur nonrandomly in the genome, especially when genome maintenance mechanisms 
are undermined. We developed single-strand DNA (ssDNA)-associated protein immunoprecipitation followed by se- 
quencing [SPI-seq) to map genomic hotspots of DNA damage. We demonstrated this method with Rad52, a homologous 
recombination repair protein, which binds to ssDNA formed at DNA lesions. SPI-seq faithfully detected, in fission yeast, 
Rad52 enrichment at artificially induced double-strand breaks (DSBs) as well as endogenously programmed DSBs for 
mating-type switching. Applying Rad52 SPI-seq to fission yeast mutants defective in DNA helicase Pfhl or histone H3K56 
deacetylase Hst4, led to global views of DNA lesion hotspots emerging in these mutants. We also found serendipitously 
that histone dosage aberration can activate retrotransposon Tf2 and cause the accumulation of a Tf2 cDNA species bound 
by Rad52. SPI-seq should be widely applicable for mapping sites of DNA damage and uncovering the causes of genome 
instability. 



[Supplemental material is available for this article.] 

Some regions of the eukaryotic genomes are more vulnerable than 
others when facing challenges to DNA integrity. In the human 
genome, special loci called fragile sites become prone to breakage 
when cells are treated with aphidicolin, a DNA polymerase inhibitor 
(Glover et al. 1984). In the budding yeast Saccharomyces cerevisiae, 
lowering DNA polymerase activity led to heightened levels of 
translocations at two hotspots, each composed of a pair of yeast 
retrotransposons (Ty elements) (Lemoine et al. 2005). It was recently 
shown that these and several other Ty elements are translocation 
hotspots even in cells with normal DNA polymerase activity (Chan 
and Kolodner 2011, 2012). The discoveries of at-risk genomic sites 
correlate with the findings that specific mechanisms have evolved 
for their protection. For example, in S. cerevisiae, DNA helicases 
Rrm3 and Pifl promote replication through hard-to-replicate sites 
and thus prevent fork stalling that may potentially cause DNA 
damage (Ivessa et al. 2003; Paeschke et al. 2011). 

A complete understanding of the extent of genome fragility 
and the molecular machineries involved in protecting at-risk sites 
requires systematic mapping of genomic hotspots of DNA damage. 
This is made possible by the advent of high-throughput technol- 
ogies such as microarray and second-generation sequencing. In 
particular, chromatin immunoprecipitation (ChIP) followed by 
microarray (ChlP-chip) analyses of a commonly used double- 
strand break (DSB) marker, phosphorylated histone H2A(X), 
have revealed in wild- type budding yeast and fission yeast cells the 
genome-wide distribution of sites prone to triggering phosphory- 
lation of H2A(X) by primary DNA damage checkpoint kinases 
(Rozenzhak et al. 2010; Szilard et al. 2010). 
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In both budding yeast and fission yeast, Rad52 (also known as 
Rad22 in the fission yeast Schizosaccharomyces pombe) is essential 
for DSB repair and homologous recombination (Mortensen et al. 
2009). Nuclear foci formed by fluorescent protein-tagged Rad52 
have been widely used to assess the level of DNA damage in live 
cell imaging analysis (Lisby et al. 2001; Du et al. 2003). Unlike 
phospho-H2A(X), which is not found at the immediate vicinity of 
DSBs due to end resection (Shroff et al. 2004), Rad52 is recruited to 
single-strand DNA (ssDNA) exposed by resection, in a replication 
protein A (RPA)-dependent manner (Lisby et al. 2004). Besides 
DSBs, Rad52 may bind to other ssDNA-containing DNA structures, 
such as single-strand gaps, which can also initiate recombination 
(Lettier et al. 2006). Thus, mapping Rad52 binding sites is expected 
to be a complementary approach to phospho-H2A(X) location 
analysis. In this report, we present a method called SPI-seq (pro- 
nounced "Spy-seq," for ssDNA-associated protein immunoprecipita- 
tion followed by sequencing) and its application to genome-wide 
analysis of Rad52 binding sites in fission yeast. Importantly, SPI-seq 
reveals not only the locations, but also the strand specificity of 
Rad52 binding, thereby providing extra information on the nature 
of DNA lesions that lead to repair protein recruitment. 

Results 

Rad52 enrichment patterns at DSBs 

We developed the SPI-seq procedure for two reasons. First, because 
Rad52 preferentially associates with ssDNA, the adaptor ligation 
method used in standard ChlP-seq protocols, which involves end 
repair and dA-tailing of double-strand DNA (dsDNA), is not suit- 
able for Rad52 binding site analysis. Moreover, ssDNA binding by 
Rad52 often has strand specificity, which, if known, may help 
deducing the initiating event that leads to ssDNA exposure and 
Rad52 recruitment. 
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The key design of SPI-seq is an adaptor-ligation scheme based 
on methods used for tagging the 3' end of single-strand cDNA 
(Shibata et al. 2001) and the 5' end of single-strand RNA (Clepet 
et al. 2004). In this scheme (Fig. 1A) ; prior to adaptor addition, all 
DNA fragments are heat denatured to become single-stranded. Two 
adaptors, one with a 5 ' overhang composed of six random nucle- 
otides (N6), and the other with a 3' N6 overhang, are ligated to the 
ends of the ssDNA using T4 DNA ligase. This design preserves the 
strand polarity information because the 3 ' end of ssDNA can only 
be ligated to the adaptor with the 3' overhang, and the 5' end 
can only be ligated to the adaptor with the 5' overhang. We in- 
corporated the Illumina single-end sequencing primer sequence 
into the adaptor tagging the 3 '-end. As DSB resection generates 



3 '-ended ssDNA, sequencing from the 3' end allows capturing the 
sequences immediately adjacent to the DSB. 

As a proof-of-principle test, we first analyzed Rad52 enrich- 
ment at an artificial DSB. Expressing the budding yeast HO endo- 
nuclease in the presence of a 24-nucleotide (nt) HO recognition 
sequence inserted at the arg3 locus (arg3 ::HO-site) on fission yeast 
chromosome 1 resulted in a persistent DSB and eventual cell death 
(Du et al. 2003, 2006). The fact that a mutation of the HO recog- 
nition sequence renders the cell resistant to continuous HO ex- 
pression suggests that no other sequences can be efficiently cut by 
HO in the fission yeast genome (Li et al. 2012). Based on our SPI- 
seq design, we expected that ChlP-enriched sequencing reads 
mapped to the right side of the DSB should be in forward 
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Figure 1. SPI-seq assay detected Rad52 bound at artificially induced and naturally occurring double-strand breaks (DSBs). (A) An adaptor-ligation 
scheme specially designed for single-strand DNA (ssDNA). (B) The expected outcome of SPI-seq when applied to an HO-induced DSB. Because of our 
adaptor-ligation design, the chromatin immunoprecipitation (ChlP)-enriched sequencing reads should map to the strands opposite to the ones bound by 
Rad52. (C) SPI-seq analysis of HO-induced DSBs. The upper and middle panels show the same read density profiles generated by kernel density estimation. 
(Upper panel) The whole chromosome 1 . The /-axis scale of the ChIP signal track is limited to 60 to allow the visualization of the weak signals at the 
endogenous HO sites. (Middle panel) Close-up view of arg3 ::HO-site and its surrounding region. The ChIP signal track shows the full height of the ChIP 
signal at arg3 ::HO-site. Green arrowheads point to the HO-site-flanking regions where strand-specific signal loss occurred in the input, presumably due to 
resection. (Lower panel) ChlP-enriched reads mapped to individual nucleotide positions at the HO cleavage site, without kernel smoothing. (D) SPI-seq 
signals at mating-type loci. To visualize reads mapped to the duplicated mating-type cassette, nonuniquely aligned reads were randomly assigned. The 
units on the /-axes in C and D are reads per 1 0 million. 
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orientation (displayed as plus strand reads on a genome browser), 
whereas reads mapped to the left side of the DSB should be in re- 
verse orientation (displayed as minus strand reads on a genome 
browser) (Fig. IB). 

By use of strains expressing epitope-tagged Rad52, we per- 
formed SPI-seq on cells suffering HO-induced DSBs. As a control, 
DNA from the input was processed using the same procedure. Se- 
quencing reads from the input were found evenly distributed 
along all three chromosomes on both strands, suggesting that our 
adaptor ligation scheme did not introduce any gross bias (Fig. 1C; 
Supplemental Fig. SI A). In contrast, reads from the ChIP sample 
showed strong and specific enrichment signals at arg3 ::HO-site. 
As expected, Rad52-bound DNA exhibited clear strand polarity, 
with plus-strand SPI-seq signals on the right and minus-strand 
signals on the left. The extensive spreading of the ChIP signal 
probably resulted from continuous DNA resection during the 
many hours between HO cutting and sample collection. At the 
single-nucleotide resolution, the plus-strand and minus-strand 
ChlP-enriched reads overlapped by 4 nt, matching precisely the 
3 '-overhangs generated by HO (Fig. 1C, bottom panel). 

In the HO-expressing strain, besides arg3 ::HO-site, we un- 
expectedly found four other sites with detectable Rad52 binding, 
two on chromosome 1 and two on chromosome 3 (Fig. 1C, Sup- 
plemental Fig. S1A,B). The SPI-seq signal patterns at these sites 
match that of a DSB, indicating there are endogenous sequences 
cleavable by HO, albeit only inefficiently. The high-resolution, 
strand-specific SPI-seq data allowed us to pinpoint the locations of 
these endogenous HO cleavage sites, even though the enriched 
signals were as low as 0.36% of that at arg3 ::HO-site (Supplemental 
Fig. SIC). In all four cases, the cleavage site sequences conform to 
the consensus HO recognition sequence (Supplemental Fig. SID; 
Nickoloff et al. 1990). The low efficiency of HO cutting at these 
sites may be due to either less-than-ideal recognition sequence or 
prohibitive chromatin structure, as it is known that in the budding 
yeast, HO cannot cleave the recognition sequences at the silent 
donor loci HML and HMR (Strathern et al. 1982). 

Homothallic fission yeast cells undergo mating-type switching 
when a strand discontinuity (called an "imprint") at the matl locus 
causes the formation of a one-ended DSB during replication 
(Arcangioli et al. 2007). This DSB then initiates recombination 
between matl and the homologous sequences at one of the silent 
donor loci, mat2-P or mat3-M. The reference genome of S. pombe is 
that of a heterothallic h~ s strain. In an h~ s strain, there is only one 
donor locus, mat2:3-M, which harbors the same "minus" cassette 
as the one at matl, thus its mating type stays unchanged despite 
constant DSB formation and recombination (Beach and Klar 
1984). When we applied SPI-seq to an h~ s strain, we observed, as 
expected, Rad52 binding at both matl-M and mat2:3-M (Fig. ID; 
Supplemental Fig. S2A). On the right side of the matl DSB, as 
observed at the HO-induced DSBs, the ChIP signal showed plus- 
strand polarity, presumably due to resection-exposed ssDNA. On 
the left side of the matl DSB, we observed the enrichment of not 
only minus-strand reads, as we saw on the left side of HO-induced 
DSBs, but also plus-strand reads, likely reflecting the association 
of Rad52 with recombination intermediates, such as the displaced 
strand of the extended displacement loop (D-loop) or the newly 
synthesized DNA unwound from the extended D-loop (Supple- 
mental Fig. S2B). 

The establishment of the matl imprint requires both ds-acting 
DNA elements (smt) and trans-acting switch (swi) genes (Arcangioli 
et al. 2007). Consistent with the essential role of the imprint 
for inducing the DSB and recombination, Rad52 binding at the 



mating loci was not observed in an smtO strain and was only barely 
noticeable in an h~ s swi3A strain (Fig. ID). Interestingly, no Rad52 
binding at the mating loci was observed in heterothallic h +N strains 
(Fig. ID). It was reported that Leupold's strain 975, from which h +N 
laboratory strains originated, had a very low imprinting level 
(Beach and Klar 1984). The exact cause of the imprinting defect of 
strain 975 is unknown. 

Except for the mating-type region in h~ s strains, we did not 
detect any significant Rad52 binding in the wild-type genome. We 
hypothesized that we might discover sites vulnerable to DNA 
damage by applying SPI-seq to mutants with heightened levels of 
genomic instability. In the following sections, we present two such 
applications. 

Rad52 enrichment patterns in pfhl helicase mutants 

In fission yeast, Pfhl is the sole homolog of budding yeast Rrm3 
and Pif 1 and is essential for maintaining both the nuclear genome 
and the mitochondrial genome (Pinter et al. 2008). As shown re- 
cently by two-dimensional gel analysis, in the nucleus, Pfhl pre- 
vents replication fork stalling at hard-to-replicate sites, including 
RNA polymerase III (Pol III) transcribed tRNA and 5S rRNA genes 
(Sabouri et al. 2012; Steinacher et al. 2012). To catalog sites pro- 
tected by Pfhl in a genome-wide manner, we analyzed Rad52 
distribution in two cold-sensitive mutants of pfhl, pfhl-R20 and 
pfhl-R23 (Tanaka et al. 2002; Ryu et al. 2004). In the R20 mutant, at 
the restrictive temperature of 20°C, dozens of Rad52 binding peaks 
were found throughout the genome, with the strongest ones near 
the centromeres of chromosomes 2 and 3 (Fig. 2; Supplemental Fig. 

53) . The R23 mutant did not have as many conspicuous Rad52 
peaks. Instead, we found a dramatic enrichment of Rad52 at the 
mating-type region (Supplemental Fig. S3), probably as a result of 
failed attempts to repair the matl DSB (the pfhl strains used in the 
analysis were h~ s ). Consistent with the SPI-seq results, we found 
that in the homothallic h 90 strain background, R23 but not R20 
showed strong mating-type switching defect (Supplemental Fig. 

54) . This phenotype of R23 resembles that of class II swi mutants 
(Arcangioli et al. 2007), suggesting the possibility that pfhl may act 
together with the class II swi genes in resolving the recombination 
intermediates during mating-type switching. 

The SPI-seq results indicate that there are a large number of 
DNA lesion hotspots in pfhl-R20 at the restrictive temperature. 
Thus, we carried out in-depth analysis of the Rad52 binding pat- 
terns in this mutant. Apart from those found at the centromeres 
and the mating-type region, the Rad52 enrichment signals in 
pfhl-R20 can be classified into two types (Fig. 2; Supplemental 
Fig. S3). Characteristic of the first type are two closely spaced 
Rad52-binding zones on opposite strands, with minus-strand sig- 
nal on the left and plus-strand signal on the right. There are 1 1 sites 
displaying this type of signals, which we named the twin-zone 
patterns, and based on signal intensity, they are further classified as 
major twin-zone patterns or minor twin-zone patterns. Within 
each major twin-zone pattern, a gap devoid of Rad52 enrichment, 
with a size ranging from 3 kb to 22 kb, can be always found be- 
tween the two enrichment zones. Belonging to the second type are 
isolated peaks with signals mainly on one strand, and we call them 
singleton peaks. The ChIP signals of the seven major twin-zone 
patterns spread tens of kbs on either side, whereas the minor twin- 
zone patterns and singleton peaks tend to be much narrower. 

To understand the nature of DNA lesion hotspots in pfhl -R20, 
we visually examined the relationship between Rad52 binding 
sites and other genomic features. Remarkably, Rad52 binding sites 
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Figure 2. Rad52 enrichment patterns in pfhl-R20. Rad52 SPI-seq read distribution along chromo- 
some 1 in pfhl-R20 at 20°C. The units on the /-axes are reads per 1 0 million. The positions and naming 
of replication origins are as described by Hayashi et al. (2007). 



are almost always associated with Pol Ill-transcribed genes (for 
brevity hereafter referred to as Pol III genes). In S. pombe, there are 
a total of 206 Pol III genes scattered throughout the nuclear ge- 
nome, including 171 tRNA genes, 33 5S rRNA genes, the U6 snRNA 
gene snu6, and the 7SL RNA gene srp7 (Hamada et al. 2001; Marck 
et al. 2006). Within each major twin-zone pattern, we could find 
two Pol III genes flanking the Rad52-free gap (Fig. 2; Supplemental 
Table SI). At the minor twin-zone patterns and singleton peaks, 
a Pol III gene can be often found close to the positions where Rad52 
ChIP signal reaches a maximum (Fig. 2; Supplemental Fig. S3). In 
addition, the levels of Rad52 enrichment at centromere-flanking 
regions largely correlate with the numbers of the Pol III genes near 
these regions (Supplemental Fig. S5). Thus, DNA lesion hotspots in 
pfhl-R20 are intimately related to Pol III genes, consistent with the 
known role of Pf hi in overcoming replication block at Pol III genes 
(Sabouri et al. 2012; Steinacher et al. 2012). 

We hypothesized that the extensive spreading of ChIP signal 
at major twin-zone patterns may result from two Pol III genes 
acting synergistically together. To test this idea, we deleted the left- 
side Pol III gene at major twin-zone pattern 1. As predicted, the 
deletion not only abolished Rad52 enrichment on the left side but 
also strongly reduced the intensity and the spreading of ChIP 
signals on the right side, essentially converting the major twin- 
zone pattern to a singleton peak (Supplemental Fig. S6). 

The narrower Rad52 distribution at singleton peaks and 
minor twin-zone patterns rendered them amenable for software- 
based peak detection. Thus, we searched for peaks outside of cen- 
tromere-flanking regions, the mating-type region, and regions 
containing major twin-zone patterns with the software MACS 
(Zhang et al. 2008). Using a P-value cutoff of 10" 7 , we found 53 
peaks with specific Rad52 enrichment in pfhl-R20 (see Supple- 
mental Methods). Consistent with our visual impression, 39 (73%) 
of these peaks overlap with Pol III genes, including 32 singleton 
peaks and seven peaks associated with minor twin-zone patterns, 
and 11 out of the 14 remaining peaks are also adjacent to Pol III 
genes (Supplemental Table S2). Permutation tests showed a signif- 
icant association of these peaks with Pol III genes (P < 0.001), but 



not with sn/sno RNA genes, highly tran- 
scribed protein-coding genes, or pre- 
dicted G-quadruplexes (Fig. 3 A). The 
peaks are not associated with the COC 
loci, where TFIIIC but not Pol III binds 
(Noma et al. 2006). We did observe a sig- 
nificant association of the peaks with 
replication origins, but this may be a sec- 
ondary consequence of the association of 
Rad52 peaks with Pol III genes, as a great 
majority of peak-associated origins are 
adjacent to Pol III genes (Fig. 3B). 

For the 39 peaks overlapping with 
Pol III genes, the peak polarity highly 
correlates with the transcription direc- 
tion of the Pol III genes. The SPI-seq sig- 
nals of 36 peaks are mainly on the non- 
template strands (P = 3.61 X 10" 8 , 
McNemar's test) (Supplemental Table S2). 
To directly visualize this correlation, we 
displayed as heatmaps the ChIP signals of 
the 32 singleton peaks overlapping with 
Pol III genes (Fig. 3C). Averaging the sig- 
nals from these peaks also revealed the 
same trend (Fig. 3D). We suspect this 
correlation is related to the observations that in Pfhl -depleted cells 
tRNA genes became polar fork barriers that caused stronger eleva- 
tion of replication pausing and recombination when replication 
and transcription move in opposite directions (Sabouri et al. 2012; 
Steinacher et al. 2012). 

There are a total of 109 Pol III genes in the genomic regions 
searched by MACS, and more than half of them neither overlap 
nor adjoin Rad52 peaks. As these nonpeak genes are occupied 
by Pol III to the same extent as the peak genes (Noma et al. 2006), 
lack of Rad52 enrichment cannot be attributed to lower Pol III 
occupancy. When we compared gene-origin distances for the 
nonpeak genes versus singleton-peak genes, we found that the 
latter tend to be closer to the downstream replication origins, 
whereas the former tend to be closer to the upstream origins 
(Fig. 3E,F). Thus, the selective binding of Rad52 at peak genes may 
also be due to the polar fork blocking effect of Pol III genes (see 
Supplemental Discussion). 

The confined and largely unimodal distribution of SPI-seq 
signals at the singleton peaks allowed us to assess where in a Pol III 
gene Rad52 enrichment preferentially occurs. As seen in the 
heatmap and the average plot (Fig. 3C,D), the highest levels 
of Rad52 enrichment were observed at a position just 5' to the 
mature RNA sequence, where Pol III initiation complex binds 
(Hamada et al. 2001). Thus, we propose that it is the Pol III initiation 
complex or the poised polymerase complex, rather than the tran- 
scribing Pol III machinery, that poses the strongest threat to in- 
coming replication fork when Pfhl function is compromised. This 
model is consistent with the report that in S. cerevisiae, fork pausing 
at tRNA genes is due to the presence of the transcription initiation 
complex rather than the act of transcription (Ivessa et al. 2003). 

Replication-associated Rad52 enrichment patterns 
in the hst4 mutant 

Acetylation at histone H3 lysine 56 (H3K56Ac) preferentially oc- 
curs during S phase and in response to DNA damage in yeasts 
and human (Costelloe and Lowndes 2010). In S. pombe, H3K56 
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acetylation requires the acetyltransf erase Rttl09 (Xhemalce et al. 
2007). Deacetylation of H3K56Ac is mediated by the sirtuin-family 
deacetylase Hst4. Loss of Hst4 caused growth defect and hyper- 
sensitivity to genotoxic agents (Haldar and Kamakaka 2008). To 
better understand the consequences of H3K56 hyperacetylation, 
we examined the Rad52 enrichment profile in hst4A by SPI-seq. 

In hst4A, Rad52 SPI-seq signal appeared as dozens of distinct 
peaks on each chromosome (Fig. 4A), with half -peak widths in the 
range of kbs to tens of kbs, significantly wider than the singleton 
peaks seen in pfhl-R20. Visual inspection revealed that peak 
locations correlate with replication origins in a specific manner; 



namely the summit of a plus-strand peak is often found in the left 
vicinity of the nearest origin, whereas the summit of a minus- 
strand peak is usually seen in the right vicinity of the nearest 
origin. Furthermore, the high-intensity peaks seem to reside pref- 
erentially within the longer inter-origin intervals. Computational 
analysis confirmed the visual impression (Fig. 4B). In average plots, 
Rad52 enrichment is more pronounced for longer inter-origin 
intervals, within which the ChIP signal reaches its height near one 
of the flanking origins — the minus-strand peaks near the left origin 
and the plus-strand peaks near the right origin. The Rad52-bound 
DNA in hst4A was most likely single-stranded, as we detected 
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Figure 4. Replication-coupled Rad52 enrichment patterns in hst4A. (A) Rad52 SPI-seq read distribution along chromosome 3. (B) Averaged SPI-seq 
signal within inter-origin regions of different sizes. Inter-origin regions were classified into three groups according to their sizes. We divided each inter- 
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similar enrichment patterns of RPA in this mutant (Supplemental 
Fig. S7). 

To examine whether Rad52 binding is influenced by cell cycle 
phase, we synchronized cells in G2 using the cdc25-22 tempera- 
ture-sensitive allele and monitored Rad52 enrichment at different 
time points after the release from G2 arrest (Fig. 4C). In S. pombe, 
the onset of septation (formation of the division septum) coincides 
with the beginning of S phase. Thus, we used the septation index 
as an indicator of cell cycle progression, which shows that cells 
began to enter S phase sometime between 40 and 60 min after 
G2 release. No Rad52 enrichment was detected at pre-S-phase 
time points (0-40 min), and Rad52 enrichment patterns emerging 
at early S-phase time points (60-80 min) were sharper and closer 
to the replication origins than those found at later time points 
(100-120 min), suggesting that Rad52 binding occurs during repli- 
cation and extends away from the origin as replication progresses. 

The replication-associated Rad52 accumulation in hst4A 
prompted us to examine two other mutants with a replication- 
related defect, swilA and swi3A. Fission yeast Swil and Swi3 pro- 
teins (orthologs of Tof 1 and Csm3 in budding yeast, and Timeless 
and Tipin in mammals, respectively) form a fork protection com- 
plex, and their mutants accumulate Rad52 foci during un- 
perturbed replication, probably as a consequence of ssDNA gaps 
formed near replication forks (Noguchi et al. 2004). In swil A 
and swi3A, we found similar Rad52 peak patterns as those in hst4A 
(Fig. 4D). Thus, fork-associated ssDNA may be a common cause of 
Rad52 accumulation in these mutants (see Supplemental Discus- 
sion). The Rad52 peaks were at positions closer to the origins in 
hst4A than in swilA and swi3A, implying that forks may suffer 
earlier or more frequent problems in hst4A . 

Epistasis analysis showed that the Rad52 enrichment pattern 
in hst4A is due to H3K56 hyperacetylation mediated by Rttl09, 
as deleting rttl09 eliminated the Rad52 pattern (Supplemental 
Fig. S8A), and so did H3K56R mutation (Supplemental Fig. S8B). 
The K56A and K56Q mutations that mimic the charge state of 
acetylated lysine had the same effect, suggesting that the func- 
tional consequence of K56 acetylation is not merely due to charge 
neutralization (Supplemental Fig. S8B). 

In S. cerevisiae, genetic analysis suggested that RttlOl, Mms22, 
and Mmsl, which are components of a potential E3 ubiquitin 
ligase complex, act downstream from H3K56Ac (Collins et al. 
2007). In S. pombe, Mms22 (also known as Mus7) and Mmsl func- 
tion together in genome stability maintenance (Dovey and Russell 
2007; Yokoyama et al. 2007; Dovey et al. 2009). It was recently 
shown that mmsl A can suppress the growth defect of hst4A 
(Vejrup-Hansen et al. 2011). We found that mms22A suppressed 
the Rad52 enrichment pattern of hst4A (Supplemental Fig. S8C), 
suggesting that in S. pombe, the Mms22-Mmsl complex also acts 
as a downstream effector of H3K56Ac. Consistent with this 
model, we found that the synthetic lethality/sickness of the hst4A 
cdslA double mutant can be suppressed by either rttl09A or 
mms22A (Supplemental Fig. S8D). 

Histone dosage aberration causes retrotransposon Tf2 
activation and the accumulation of a Tf2 cDNA species 
bound by Rad52 

In fission yeast, there are three pairs of histone H3-H4 genes (hhtl- 
hhfl, hht2-hhf2, and hht3-hhf3). To generate the histone H3K56 
point mutations shown in Supplemental Figure S8B, we employed 
a strain in which two of the gene pairs hhtl-hhfl and hht3-hhf3 
were deleted (hereafter referred to as A1A 3) (Mellone et al. 2003). 



Consistent with the observations that the A1A3 strain lacks any 
overt growth defect or chromosome instability (Mellone et al. 
2003), we did not observe in this strain any notable Rad52 en- 
richment patterns when uniquely mapped reads were displayed on 
a genome browser (Fig. 5 A). However, when nonuniquely mapped 
reads were randomly assigned and visualized, we were surprised to 
see strong strand-specific enrichment of Rad52 at the 13 Tf2 ret- 
rotransposons, which share almost identical DNA sequences 
(Fig. 5A). Tf2 belongs to the Ty3/Gypsy family of long terminal 
repeat (LTR)-containing retrotransposons and is the only type of 
transposable elements capable of remobilization in standard 
S. pombe strains (Bowen et al. 2003). ChlP-enriched reads can also 
be aligned to the Tf-fragment 1 element, and a number of solo 
LTRs, probably due to their sequence similarity to the full-length 
Tf2. Thus, our SPI-seq analysis fortuitously identified a novel 
connection between histone genes and Tf2 retrotransposon. 

If Rad52 binding occurred on the Tf2 genomic DNA in the 
A1A3 strain, we would expect to see some ChIP signal at Tf2- 
flanking regions. However, virtually no enriched reads could be 
detected on the Tf2-flanking genomic DNA (Fig. 5A). Thus, Rad52 
most likely bound to Tf2 cDNA. Furthermore, within the internal 
region between the two LTRs, SPI-seq signals were found exclu- 
sively on the nontemplate strand except for the primer-binding 
site (PBS), where a strong peak was seen on the template strand 
(Fig. 5B). These observations, together with the fact that the signals 
on the LTRs were significantly lower than those in the internal 
region, led us to propose that the Rad52-bound cDNA may repre- 
sent a reverse transcription intermediate, in which the LTR region 
is double-stranded and the rest of the cDNA remains single- 
stranded (Supplemental Fig. S9). 

To verify the SPI-seq results, we performed ChlP-PCR analysis 
using strains in which one Tf2 is tagged with a neo marker (Sehgal 
et al. 2007). Indeed, in A1A3, we detected Rad52 enrichment on 
the neo marker gene (Fig. 5C). 

In budding yeast, histone gene deletions cause the increase of 
retrotransposon Tyl transcription (Nyswaner et al. 2008). To test 
the possibility that Tf2 transcription in A 1 A3 may be altered, we 
performed Northern blotting analysis (Fig. 5D). Tf2 transcripts 
were hardly detectable in strains in which none or only one of the 
three pairs of H3-H4 genes was deleted, yet their abundance in- 
creased greatly in all three mutants with two pairs of H3-H4 genes 
deleted, especially A 1A3. 

The GATA-type transcription factor Ams2 is required for 
S-phase activation of H3 and H4 genes in fission yeast (Takayama 
and Takahashi 2007). Thus, it was no surprise that we observed up- 
regulation of Tf2 transcription in ams2A, too (Fig. 5D). Together, 
these findings suggest that Tf2 silencing in fission yeast requires 
normal levels of histone expression. 

To see whether Tf2 derepression is always associated with 
Rad52 binding to Tf2 cDNA, we carried out SPI-seq analysis on 
ams2A, as well as two other mutants defective in Tf2 silencing, the 
HIRA histone chaperone mutant slm9A and the CENP-B mutant 
cbplA (Greenall et al. 2006; Cam et al. 2008). In both ams2A and 
slm9A, we observed Rad52 enrichment on Tf2, with patterns 
similar to those in A1A3 (Fig. 5E). Surprisingly, no Rad52 enrich- 
ment on Tf2 was observed for cbplA (Fig. 5E). ChlP-qPCR analysis 
confirmed the SPI-seq results (Supplemental Fig. SI OA). The lack of 
Rad52 binding to Tf2 cDNA in cbplA correlated with a reduced 
accumulation of Tf2 cDNA, even though the level of Tf2 RNA was 
similar to that of the other mutants (Supplemental Fig. S10B,C). 
The reason for the unique phenotype of cbplA remains to be 
determined. 
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Figure 5. Rad52 enrichment on retrotransposon cDNA in histone dosage mutants. (A) Rad52 SPI-seq read distribution along chromosome 1 in A 1A3. 
Enrichment at Tf2 retrotransposons was detected when nonuniquely aligned reads were randomly assigned and visualized. (B) Rad52 SPI-seq reads from 
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Pronounced binding of Rad52 to Tf2 cDNA is associated 
with a dramatic increase of transposition, as we found that the 
Tf2 transposition frequencies in A 1 A3 and ams2A were more than 
80-fold higher than that in the wild type, and more than 10-fold 
higher than that in cbplA (Fig. 5F). Tf2 mobilization is often me- 
diated by integrase-independent recombination between its cDNA 
and pre-existing genomic copies of Tf2 (Hoff et al. 1998; Sehgal 
et al. 2007). We speculate that the binding of Rad52 to an imma- 
ture form of Tf2 cDNA may contribute to this main mode of Tf2 
transposition. 

Discussion 

We have developed a method for revealing genome-wide locations 
of DNA repair proteins associated with ssDNA. By use of this 
method, we identified specific Rad52 accumulation patterns in 
several different mutants and thus generated high-resolution 
global views of DNA lesion hotspots emerging upon the impair- 
ment of genome maintenance factors. 



Comparison with other methods that can reveal DNA lesion 
hotspots 

There are two types of assays that can reveal DNA lesion hotspots. 
The first type of assays maps the breakpoint junctions of chro- 
mosomal rearrangements. Because rearrangements are repair 
products of DNA lesions, these methods indirectly infer the loca- 
tions of the initiating lesions. In budding yeast, such an approach 
has led to the discovery of the Ty element-associated fragile sites 
(Lemoine et al. 2005). One important method in this category is 
the gross chromosomal rearrangement (GCR) assay developed by 
the Kolodner laboratory, which is based on the selection against 
two markers near the left end of chromosome V in budding yeast 
(Chen and Kolodner 1999). The reanangement breakpoints in hun- 
dreds of GCR-containing strains have been mapped, although doing 
such mapping one strain at a time is a time- and cost-consuming 
endeavor (Putnam et al. 2005; Chan and Kolodner 2012). 

A limitation of the growth selection-based rearrangement 
detection systems is the difficulty of capturing rearrangement 
events that compromise cell growth. In addition, the locations of 
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the selection markers may affect the types of rearrangement that 
can be recovered. For example, rrm3 mutant has the same GCR rate 
as the wild type despite the known role of Rrm3 in preventing 
replication stalling at hard-to-replicate sites, possibly due to the 
lack of such sites in the 12-kb region monitored by the GCR assay 
(Schmidt and Kolodner 2006). Recently deep sequencing-based 
methods have been employed to simultaneously identify tens of 
thousands of translocation junctions without growth selection 
(Chiarle et al. 2011; Klein et al. 2011). These approaches rely on an 
artificial DSB to induce and map translocations and thus are still 
not bias free. 

Our SPI-seq method belongs to the second type of ap- 
proaches, which directly monitors the physical locations of DNA 
damage and thus does not depend on DNA repair outcomes. 
Microarray-based mapping of ssDNA has been achieved through 
random priming without denaturation (Feng et al. 2006) or benzoyl 
naphthoyl DEAE (BND) cellulose enrichment (Buhler et al. 2007). 
DSBs have been mapped by microarray using an end-labeling 
procedure (Feng et al. 2011). These DNA-targeted assays are prone 
to in vitro artifacts and noises due to DNA denaturation and 
breakage. New protocols that lyse cells and label DNA in agarose 
plugs minimize but may not completely eliminate these problems 
(Feng et al. 2011). 

Unlike assays directly targeting DNA, SPI-seq and other ChlP- 
based methods indirectly capture lesion-associated DNA by 
enriching proteins bound at lesion sites. These methods are 
relatively insensitive to in vitro manipulations that may perturb 
DNA conformations, as the protein-DNA interaction is fixed by 
formaldehyde prior to cell lysis. Because lesion-associated proteins 
vary in their binding specificity and distribution patterns, ChIP 
data reflect not only the locations of the DNA lesions but also the 
properties of the protein being enriched. For example, phospho- 
H2A(X) is subject to many factors influencing the distribution 
and activities of the ATM/ATR checkpoint kinases, and thus it re- 
mains uncertain whether all sites enriched for phospho-H2A(X) 
correspond to DSBs. 

Due to superior resolution and lower cost, ChlP-seq is 
replacing ChlP-chip to become the method of choice for location 
analysis of DNA-binding proteins. ssDNA-binding proteins pose 
a challenge for ChlP-seq because routine sequencing library 
preparation procedures cannot ligate adaptors to ssDNA. The 
strand polarity of ssDNA offers an extra dimension of spatial in- 
formation that, to our knowledge, cannot be captured by any 
existing approaches. Our SPI-seq method is tailor-designed for 
ssDNA-binding proteins, and thus can report the true locations of 
ssDNA, and more importantly, reveal strand polarity information. 
Because SPI-seq does not directly monitor ssDNA, in cases where 
signals are detected on both strands (e.g., the matl locus), SPI-seq 
data alone cannot distinguish between binding to ssDNA of both 
polarities and binding to dsDNA. 

Potential applications 

SPI-seq can be easily adopted for proteins other than Rad52 and 
RPA, and nonyeast organisms. For example, in human cells, repair 
proteins such as RAD51, RAD 5 2, BRCA2, and FANCD2, as well as 
checkpoint signaling proteins like ATR and ATRIP, accumulate on 
ssDNA upon DSB formation (Bekker-Jensen et al. 2006). As long as 
suitable antibodies are available for ChIP, the genome-wide loca- 
tions of any of these proteins can be analyzed using SPI-seq. In 
addition, the adaptor-ligation scheme we designed can also be 
applied to ChlP-enriched dsDNA, thus allowing other DNA lesion 



markers such as phospho-H2A(X) to be profiled by the same pro- 
cedure. Besides mapping spontaneous DNA lesion hotspots, SPI- 
seq should also be useful for cataloging in vivo cleavage sites of 
genome-editing nucleases, such as zinc-finger nucleases (ZFNs) 
and transcription activator like effector nucleases (TALENs). 

Methods 

Strains and culturing conditions 

The strains used in this study are listed in Supplemental Table S3. 
Cells used for ChIP assay were cultured in EMM (Edinburgh Minimal 
Medium) with necessary supplements at 30°C unless otherwise 
noted. For the HO experiment, cells were shifted to thiamine-free 
medium for 1 7 h before crosslinking by formaldehyde. For pfhl 
mutants, cells were cultured at the restrictive temperature for 18 h 
before crosslinking. 

SPI-seq 

Formaldehyde crosslinking was performed for 1 h at 4°C. Cells 
were lysed by glass bead beating. Chromatin was fragmented by 
sonication. Antibody-enriched chromatin fragments were reverse- 
crosslinked overnight at 65°C. ChIP DNA or input DNA was pre- 
denatured at 95°C and quenched on ice. The DNA was mixed with 
two pre-annealed adaptors added at final concentrations of 1.5 |xM 
each. Adaptor 1 was composed of oligo A (5'-GCTCTTCCGAT 
CXXXXCNNNNNN-NH 2 -3') and oligo B (5'-p-GXXXXGATCG 
GAAGAGCGTCGTGTAGGGAAAGAGTGT-NH 2 -3'). XXXX in the 
oligo sequence denotes 4-nucleotide multiplex indexes. Adaptor 2 
was composed of oligo C (5 '-NNNNNNGTTCAGAGTTCTGCGA 
CAGGAGAG-NH 2 -3 ') and oligo D (5 '-CAAGCAGAAGACGGCA 
TACGACCTCTCCTGTCGCAGAACTCTGAAC-3'). The ligation 
reaction was conducted using the Quick Ligation Kit (New England 
Biolabs) overnight at 16°C. The ligation product was then 
separated on a Low Range Ultra agarose gel (Bio-Rad). DNA in the 
250- to 500-bp range was retrieved with the GFX PCR DNA and Gel 
Band Purification Kit (GE Healthcare). The gel-purified DNA was 
amplified by PCR using primers PI (5 ' -AATGATACGGCGAC 
C ACCGAG ATCTAC ACTCTTTCCCTAC ACG A-3 ' ) and P2 (5'-CAA 
GCAGAAGACGGCATACGA-3 ') for 25 cycles. The PCR product 
was processed by gel-based size selection, again retaining DNA in 
the 250- to 500-bp range. Equal amounts of DNA tagged with 
different multiplex indexes were mixed together and sequenced 
on an Illumina GA-II instrument using the sequencing primer 
5 '-ACACTCTTTCCCTACACGACGCTCTTCCGATC-3 ' . Sequenc- 
ing reads were aligned to the genome using SOAP2 (Li et al. 2009). 
The read alignment output was converted to a read density profile 
using kernel density estimation (Valouev et al. 2008) and visual- 
ized with Integrated Genome Browser (Nicol et al. 2009). Unless 
otherwise noted, the kernel density bandwidth was set at 30 bp. 

Data access 

The sequencing data described here are deposited at the NCBI 
Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/ 
geo/) under accession number GSE39166. 
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